A FastICA Algorithm for Non-negative Independent Component Analysis
نویسندگان
چکیده
The non-negative ICA problem is here defined by the constraint that the sources are non-negative with probability one. This case occurs in many practical applications like spectral or image analysis. It has then been shown by [10] that there is a straightforward way to find the sources: if one whitens the non-zero-mean observations and makes a rotation to positive factors, then these must be the original sources. A fast algorithm, resembling the FastICA method, is suggested here, rigorously analyzed, and experimented with in a simple image separation example. 1 The Non-negative ICA Problem The basic linear instantaneous ICA mixing model x = As can be considered to be solved, with a multitude of practical algorithms and software; for reviews, see [1, 3]. However, if one makes some further assumptions which restrict or extend the model, then there is still ground for new analysis and solution methods. One such assumption is positivity or non-negativity of the sources and perhaps the mixing coefficients; for applications, see [9, 5, 13, 2]. Such a constraint is usually called positive matrix factorization [8] or non-negative matrix factorization [4]. We refer to the combination of non-negativity and independence assumptions on the sources as non-negative independent component analysis. Recently, Plumbley [10, 11] considered the non-negativity assumption on the sources and introduced an alternative way of approaching the ICA problem, as follows. He calls a source si non-negative if Pr(si < 0) = 0, and well-grounded if Pr(si < δ) > 0 for any δ > 0; i.e. si has non-zero pdf all the way down to zero. The following key result was proven [10]: Theorem 1. Suppose that s is a vector of non-negative well-grounded independent unit-variance sources si, i = 1, ..., n, and y = Qs where Q is a square orthonormal rotation, i.e. QQ = I. Then Q is a permutation matrix, i.e. the elements yj of y are a permutation of the sources si, if and only if all yj are non-negative. This work was supported by the Academy of Finland as part of its Center of Excellence project “New Information Processing Principles”. C.G. Puntonet and A. Prieto (Eds.): ICA 2004, LNCS 3195, pp. 1–8, 2004. c © Springer-Verlag Berlin Heidelberg 2004 2 Zhijian Yuan and Erkki Oja The result of Theorem 1 can be used for a simple solution of the non-negative ICA problem. The sources of course are unknown, and Q cannot be found directly. However, it is a simple fact that an arbitrary rotation of s can also be expressed as a rotation of a pre-whitened observation vector. Denote it by z = Vx with V the whitening matrix. Assume that the dimensionality of z has been reduced to that of s in the whitening, which is always possible in the overdetermined case (number of sensors is not smaller than number of sources). It holds now z = VAs. Because both z and s have unit covariance matrices (for s, this is assumed in Theorem 1), the matrix VA must be square orthogonal. This holds even in the case when s and z have non-zero means. We can write y = Qs = Q(VA) z = Wz where the matrix W is a new parametrization of the problem. The key fact is that W is orthogonal, because it is the product of two orthogonal matrices Q and (VA) . By Theorem 1, to find the sources, it now suffices to find an orthogonal matrix W for which y = Wz is non-negative. The elements of y are then the sources. It was further suggested by [10] that a suitable cost function for actually finding the rotation could be constructed as follows: suppose we have an output truncated at zero, y = (y 1 , ..., y + n ) with y + i = max(0, yi), and we construct a reestimate of z = Wy given by ẑ = Wy. Then a suitable cost function would be given by J(W) = E{‖z− ẑ‖2} = E{‖z−WTy+‖2}. (1) Due to the orthogonality of matrix W, this is in fact equal to J(W) = E{‖y − y+‖2} = n ∑ i=1 E{min(0, yi)}. (2) Obviously, the value will be zero if W is such that all the yi are positive. The minimization of this cost function by various numerical algorithms was suggested in [11, 12, 7]. In [11], explicit axis rotations as well as geodesic search over the Stiefel manifold of orthogonal matrices were used. In [12], the cost function (1) was taken as a special case of “nonlinear PCA” for which an algorithm was earlier suggested by one of the authors [6]. Finally, in [7], it was shown that the cost function (2) is a Liapunov funtion for a certain matrix flow in the Stiefel manifold, providing global convergence. However, the problem with the gradient type of learning rules is slow speed of convergence. It would be tempting therefore to develop a “fast” numerical algorithm for this problem, perhaps along the lines of the well-known FastICA method [3]. In this paper, such an algorithm is suggested and its convergence is theoretically analyzed. 2 The Classical FastICA Algorithm Under the whitened zero-mean demixing model y = Wz, the classical FastICA algorithm finds the extrema of a generic cost function E{G(wT z)}, where w A FastICA Algorithm for Non-negative Independent Component Analysis 3 is one of the rows of the demixing matrix W. The cost function can be e.g. a normalized cumulant or an approximation of the marginal entropy which is minimized in ICA in order to find maximally nongaussian projections w z. Under fairly weak assumptions, the true independent sources are among the extrema of E{G(wT z)} [3]. FastICA updates w according to the following rule: w ← E{zg(wT z)} − E{g′(wT z)}w. (3) Here g is the derivative of G, and g′ is the derivative of g. After (3), the vectors w are orthogonalized either in a deflation mode or symmetrically. The algorithm typically converges in a small number of steps to a demixing matrix W, and y becomes a permutation of the source vector s with arbitrary signs. 3 The Non-negative FastICA Algorithm For the non-negative independent components, our task becomes to find an orthogonal matrix W such that y = Wz is nonnegative with the pre-whitened vector z. The classical FastICA is now facing two problems. First, the non-negative sources cannot have zero means. The mean values must be explicitly included in the analysis. Second, in FastICA, the function g in equation (3) is assumed to be an odd fuction, the derivative of the even function G. If this condition fails to be satisfied, the FastICA as such may not work. Applying FastICA to minimizing the cost function (2), we see that G(y) = min(0, y) whose negative derivative (dropping the 2) is g−(y) = −min(0, y) = {−y, y < 0 0, y ≥ 0. (4) We see that it does not satisfy the condition for FastICA. In order to correct these problems, first, we use non-centered but whitened data z, which satisfies E{(z−E{z})(z−E{z})T } = I. Second, we add a control parameter μ on the FastICA update rule (3), giving the following update rule: w ← E{(z− E{z})g−(w z)} − μE{g′ −(w z)}w, (5) where g′ − is the derivative of g−. This formulation shows the similarity to the classical FastICA algorithm. Substituting function g− from (4) simplifies the terms; for example, E{g′ −(w z)} = −E{1|wT z < 0}P{wTz < 0}. The scalar P{wT z < 0}, appearing in both terms in (5), can be dropped because the vector w will be normalized anyway. In practice, expectations are replaced by sample averages. In (5), μ is a parameter determined by: μ = min {z:z∈∆)} E{(z− E{z})wT z|wT z < 0}Tz E{1|wT z < 0}wTz . (6) 4 Zhijian Yuan and Erkki Oja There the set ∆ = {z : z z(0) = 0}, with z(0) the vector satisfying ||z(0)|| = 1 and w z(0) = max(w z). Computing this parameter is computationally somewhat heavy, but on the other hand, now the algorithm converges in a fixed number of steps. The nonnegative FastICA algorithm is shown in Table 1. Table 1. The Non-negative FastICA algorithm for estimating several ICs. 1. Whiten the data to get vector z. 2. Set counter p ← 1. 3. Choose an initial vector wp of unit norm, and orthogonalize it as
منابع مشابه
Monte Carlo Algorithm for Least Dependent Non-Negative Mixture Decomposition
We propose a simulated annealing algorithm (stochastic non-negative independent component analysis, SNICA) for blind decomposition of linear mixtures of non-negative sources with non-negative coefficients. The demixing is based on a Metropolis-type Monte Carlo search for least dependent components, with the mutual information between recovered components as a cost function and their non-negativ...
متن کاملAn Introduction to Independent Component Analysis: InfoMax and FastICA algorithms
This paper presents an introduction to independent component analysis (ICA). Unlike principal component analysis, which is based on the assumptions of uncorrelatedness and normality, ICA is rooted in the assumption of statistical independence. Foundations and basic knowledge necessary to understand the technique are provided hereafter. Also included is a short tutorial illustrating the implemen...
متن کاملCalculation of Leakage in Water Supply Network Based on Blind Source Separation Theory
The economic and environmental losses due to serious leakage in the urban water supply network have increased the effort to control the water leakage. However, current methods for leakage estimation are inaccurate leading to the development of ineffective leakage controls. Therefore, this study proposes a method based on the blind source separation theory (BSS) to calculate the leakage of water...
متن کاملIndependent Component Analysis for Filtering Airwaves in Seabed Logging Application
Marine controlled source electromagnetic (CSEM) sensing method used for the detection of hydrocarbons based reservoirs in seabed logging application does not perform well due to the presence of the airwaves (or sea-surface). These airwaves interfere with the signal that comes from the subsurface seafloor and also tend to dominate in the receiver response at larger offsets. The task is to identi...
متن کاملAverage Convergence Behavior of the FastICA Algorithm for Blind Source Separation
The FastICA algorithm is a popular procedure for independent component analysis and blind source separation. In this paper, we analyze the average convergence behavior of the single-unit FastICA algorithm with kurtosis contrast for general m-source noiseless mixtures. We prove that this algorithm causes the average inter-channel interference (ICI) to converge exponentially with a rate of (1/3) ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004